How Much Information can be Obtained by a Quantum Measurement ? 
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How much information about an unknown quantum state can be obtained by a measurement? We 
propose a model independent answer: the information obtained is equal to the minimum entropy of 
the outputs of the measurement, where the minimum is taken over all measurements which measure 
the same "property" of the state. This minimization is necessary because the measurement outcomes 
can be redundant, and this redundancy must be eliminated. We show that this minimum entropy 
is less or equal than the von Neumann entropy of the unknown states. That is a measurement can 
extract at most one meaningful bit from every qubit carried by the unknown states. 



Quantum mechanics has at its core a fundamental statistical aspect. Suppose you are given a single quantum 
particle in a state unknown to you. There is no way to find what \^) is - to find it out you need an infinite 
ensemble of quantum particles, all prepared in the same state. Indeed, the different properties which characterize the 
state are, in general, complementary to one another; measuring one disturbes the rest. Only if an infinite ensemble is 
given can one find out the state. But infinite ensembles don't exist in practice. Given a finite ensemble of identically 
prepared particles, how well can one estimate the state? The problem is a fundamental one for understanding the 
very basis of quantum mechanics. It has been investigated by many authors, see for instance |l]] [^j, and it constitutes 
probably the oldest problem in what is at present called "quantum information" . Here we approach this problem 
from a new point of view which, we think, leads to a deeper understanting. 

What is the optimal way to estimate the quantum state given a finite ensemble? As such the question is not well 
posed. Indeed, since we cannot completely determine the state , i.e. completely determine all its properties, we must 
decide which particular property we want to determine. For an ensemble of spins, for example, estimating as well as 
possible the mean value of the z spin component is, obviously, a different question than estimating as well as possible 
the mean value of the x spin component. 

But things are in fact even more complicated. The apparent benign words "as well as possible" in the previous 
paragraph are not well defined. Indeed, "as well as possible" actually means "as well as possible given a specific 
measure of what "well" means" . Obviously, one can imagine many different measures. For example, suppose that a 
source emits states with probability pi. The problem is to design a measurement at the end of which we must 
guess which state was emitted. Let the guess be \cf>j Uess }, and let the measure of success (fidelity) be 



i.e. the absolute value square of the scalar product in between the true state \ybi) and the guess \<fij Uess ). The goal is 
to optimize the measurement such that it yields the highest average fidelity 



where p(j\i) is the probability to make guess j if the state is On the other hand, one can imagine another fidelity 
function, such as 



I. INTRODUCTION 



(i) 




(2) 



(3) 



Or one could try to optimize the mutual information 




(4) 



or any other measure. 
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The important point to notice about the above different problems is that the different fidelities (2-4) not only define 
different scales according to which we measure the degree of success in estimating the state, but also, implicitly, define 
which property of the state we are actually estimating. If all the different fidelities where to lead to the same optimal 
measurements, we could say that we learn the same property about the state but just expressed in a different way. 
However the different fidelities will in general lead to different optimal measurements which means that in each case 
we learn a different property about the system. 

To summarize, in general each particular estimation problem is completely different from the other, they measure 
different properties and their degree of success is measured on different scales, with the scales also defining implicitly 
what exactly is the property we estimate. 

That one can learn different properties is a fact of life inherent to quantum mechanics. But there is no reason not 
to use the same scale to gauge how successful we have been in learning the property we decided to measure. The aim 
of this paper is to propose such a universal scale, and in the process to introduce a novel approach to quantum state 
estimation. 

II. MAIN IDEA 

The central point of our approach starts from a simple but fundamental question: what do we actually learn from a 
measurement on a state? Let us illustrate this question by an example. We shall contrast two situations. Consider a 
source which emits spin 1/2 particles. In the first case the particles are polarized with equal probability along either 
the +z (| 1z)) or — z ( \ z )~) directions. In the second case the states are polarized along random directions uniformly 
distributed on the sphere. Suppose we want to identify the states as well as possible according to the fidelity eq. (||). 
In the first case it is obvious that a measurement along a z perfectly identifies the state, hence the fidelity is F = 1. 
In the second case, it has been shown || that the measurement along a z is also optimal. But in this case the states 
cannot be identified perfectly, and the fidelity is only F — 2/3. 

Nevertheless the two situations seem extremely similar. In both cases we perform the same measurement. And in 
both cases before we perform the measurement we know that the outcomes of the measurement are either +1 or — 1, 
and the a priori probabilities of the two outcomes are equal. When we perform the measurement this uncertainty is 
resolved. Hence in both cases the measurement yields 1 bit of information. Our main idea is to interpret this quantity 
as the information we extract from the state. Incidentally we note that in both cases this information (the Shannon 
information of the outcomes) equals the von Newmann entropy of the unknown states (both are equal to 1). 

This idea might seem paradoxical at first sight because in one case we completely recognize the state whereas in 
the other case we recognize it badly. To understand let us introduce a classical source that decides which quantum 
state is emitted from the quantum source (see figure 1). In the first case the classical source must only specify one 
bit (either +z or —z) to determine which state is emitted. In the second case it must provide a direction n in (ie. an 
infinite number of bits) in order to specify the state | fn. )• In both cases one extracts one bit of information. In the 
first case this means that the classical information supplied by the source is completely recovered. In the second case 
information is lost. However it is now clear that the loss does not occur during the measurement, but during the first 
step, where classical information is converted into quantum. 
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FIG. 1. Chain of events leading to a quantum state estimation problem. The classical source specifies which state should be 
sent. The quantum source then emits the corresponding state. Finally the measuring device tries to identify the emitted state. 

To summarize, the quantum state estimation problem as presented in figure 1 consists of a chain of events which 
starts with a classical source that tells the quantum source what state to emit, and ends with the measurement. The 
fidelity measures the overall performance of the chain since it is proportional to the scalar product n in .n guess . On the 
other hand the number of bits in the output characterizes how much information is extracted by the measurement. 
Therefore in this article we shall focus on the latter quantity. 



III. MAIN RESULT 



The preceding discussion suggests that the Shannon information of the outcomes 

Output = - ^Pj^pj , (5) 
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where pj = ^2iP{j\i) is the probability of outcome j, measures how much information is extracted from the state. 
This idea however has to be refined. 

The main problem is that there may be redundancies in the outputs of the measurement. As a trivial example, 
a measurement could be accompanied by the flip of a coin, and the outcomes of the measurement would consist of 
both the outcomes of the measurement proper and the outcomes of the coin flip. This adds one bit to the entropy 
of the outputs without telling anything about the system. In less trivial examples involving POVM's and ancillas, 
redundancies can arise in a less obvious way, and it is not immediate how they can be identified and eliminated. 

Our main result is that no matter what property of the system one wants to measure, when the redundancy is 
eliminated, the remaining Shannon information of the outputs has a universal upper bound which is the von Neumann 
entropy of the quantum source: 

Iout P ut( n ° redundancy) < I™ ut , (6) 

where I^put — —Trp In p is the Shannon information of the quantum source and p is the density matrix of the quantum 
source p = J2iPi 

One does not always attain equality in eq. (^|). Indeed some questions are more informative about the system 
then others. Less informative questions can be answered by measurements whose output entropy is smaller. More 
informative questions require measurements with more entropy. But the most detailed questions can always be 
answered in T(^L t bits. 



IV. STRATEGY 

The main problem we face in deriving eq. (|^) is to eliminate the redundancy. In order to do this we shall proceed 
in several steps. 
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1. The first step is to decide which property we are interested in. We may fix the property directly (for instance 
decide to measure the average of a z ) or implicitly by choosing a fidelity. In the rest of the this paper we shall 
adopt the second approach. 

2. We then look at optimal measurements, that is measurements which maximize the fidelity. In general there is 
an entire class of such measurements. 

3. We perform a second optimization. Namely among the optimal measurements we look for the measurements 
which minimize I^ utput . 

This double optimization strategy has already been considered for some particular cases in Q] || |6| . 

One expects that this strategy yields measurements which have no spurious redundancy. However as we will find 
out later through some examples, redundancies cannot be completely eliminated by the above procedure and we will 
have to further modify it. 

These further modifications are motivated by the classical and quantum theory of information JtJ (^] which suggest 
the idea of performing measurements on blocks of quantum states, rather than on individual particles. Thus we shall 
allow the measuring device to accumulate a large number L of input states before making a collective measurement on 
the L states simultaneously. It is in the context of these collective measurements that we make the two optimizations 
(points 2 and 3 above) and thereby eliminate the spurious redundancies. 

We want to emphasize that this procedure cannot increase the fidelity since the subsequent particles are completely 
uncorrelated. However by considering measurements on large blocks we can hope to reduce the redundancy of the 
measurement, ie. the entropy of the outcomes, by making "better use" of each outcome. 

Two technicalities have to be taken into account. First of all we must take care not to modify the definition of fidelity 
as we go from measurements on single particles to block measurements. That is the fidelity must still be the fidelity 
of each state individually, rather than the fidelity for the whole block. Second we should not require the measurement 
to absolutely maximize the fidelity, since then using block measurements does not help to reduce the entropy (this 
follows once more from the fact that the subsequent states are completely uncorrelated). However, following the ideas 
of information theory, we shall only require that the measurement has a fidelity approaching arbitrarily closely the 
optimum. In this framework we shall prove eq. ([|) . 

To summarize, there is no best way of estimating an unknown quantum state. Different measurements will learn 
about different properties of the state, and it is up to us to choose which property we want to learn about. However once 
we fix the property we want to learn about, we show that quantitatively one cannot learn more than I^put = —Trpln p 
bits about this property. That is a measurement can extract at most one meaningful bit from each qubit coming from 
the source. 



V. EXAMPLES 



Before embarking on a proof of our result, we give two examples which illustrate the main points that must be 
taken into account in the proof. 

In the first example there are two possible input states — a\ t) + f3\ |) and \1jj2) = a\ "]) — f3\ J,) which occur 
with equal probability. The density matrix of the source is p = a 2 \ | + f3 2 \ [){[ | which is different from the 
identity for a ^ (3 Therefore the von Newmann entropy of the input states I^iput < 1 qubit. 

In this example we use a fidelity defined as follows: after each measurement one must guess whether the state is 
IV'i) or IV^)- I n case of a correct guess one receives a score of +1, and for an incorrect guess one receives a score of 
— 1. The aim is to maximize the average score. The techniques of section VII can be used to show that the optimal 
measurement is a von Neumann measurement of a x , see figure 2. The two outcomes of this measurement occur with 
equal probability, and hence I^ utput = 1 > iYnput- 



FIG. 2. The two input states 1^2) = a\ |) ± /3| j). The optimal measurement is a measurement of the spin in the x 
direction. 
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In this example, a natural first step in eliminating the redundancy is to project blocks of input states onto their 
probable subspace [|| 0. This projection succeeds with arbitrarily high probability, and affects the input states 
arbitrarily little. But it reduces the dimensionality of the Hilbert space of the input states from 2 N to 2 JV/i "*<*« . Hence 
if we can prove that there is a von-Newman measurement restricted to the probable subspace that is optimal, we will 
have proved our claim. However the construction of such a von-Newmann measurement is non trivial, as is illustrated 
in the next example. 

In our second example there is no "most probable" subspace because the density matrix of the inputs is completely 
random. In this example there are three input states \ipi) = | j), \ipi) — \\ T) + i)> ^3) = ^1 T) — -^1 l)j 
each occurring with equal probability p t = 1/3. The density matrix of these states is p = 1/2 and their entropy is 
1-input = 1 qubit. The fidelity is defined as above: after the measurement one must guess which was the input state. 
If the guess is correct one scores +1 point, if the guess is incorrect, one scores —1 points. The aim is to maximize the 
average score (fidelity). 



4>i 




FIG. 3. The three input states tpi, ip2, ^3 in the second example. The optimal measurement is a POVM whose elements are 
projectors onto the three states tpi, ip2, ip3- 



Using the techniques of section VH, one can show that the elements of an optimal POVM are necessarily proportional 
to the three projectors |^>i)(^>i|, IV^) (V^l, | ^3) {"03 ! j see figure 3. Therefore the optimal POVM whose output entropy 
is minimum is {||-0i)(V>i|, flV^XV^I, § | V^3 } (V^3 1 }" - I* 1 this case Ioutput = hi3 > 1 bits. The other optimal measurements 
have larger Ioutput > In 3 bits. One can also show that there is no measurement on blocks of L input states whose 
fidelity is strictly equal to the optimum and whose output entropy is less then L In 3 bits. However if one only requires 
that the fidelity is arbitrarily close to the maximum, then in the asymptotic limit (L — > 00) the output entropy can 
be made arbitrarily close to L bits, thereby attaining the bound eq. (^|). The main difficulty of the proof will be to 
construct such a measurement on large blocks whose output entropy is equal to L bits and whose fidelity is arbitrarily 
close to the optimal fidelity. 



VI. PLAN OF THE PROOF 



The main part of this paper is devoted to proving the bound eq. 



of fidelities, and derive some properties of the optimal measurements. In section VIII we show how to generalize 



In section VII we introduce a large class 



these fidelities to measurements on large blocks of input states. At the end of section VIII we are in a position to 
state with precision a first version of our main result, eq. ( |q). In section [X we extend the notion of fidelity and 
state a slightly more general version of our resu lt. I n scctionK| we show how to construct a measurement on large 
blocks which has little redundancy. In section XH we derive an intermediate result concerning the fidelity of the 
measurement constructed in section XI. If the states are uniformly distributed in Hilbert space (ie. the density 
matrix is proportional to the identity, p — I/d) , then this intermediate result already proves our main claim eq. 

When the states are not uniformly distributed in Hilbert s pace , we must first project b locks of states onto the 
probable subspace before using the intermediate result of section KH . This is done in section XIII and completes the 
proof of eq. (0). 



VII. FIDELITY 



Let us consider the general setup described in figure 1. The states emitted by the quantum source belong to a 
Hilbert space of dimension d. They occur with probability pi . Their density matrix is p — ^2 i Pi\'4 > i)(i J i\ with Trp = 1. 
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The most general measurement on the input states is a POVM with M element: aj > 0, Ylj/Li a j = Id- 

We introduce the fidelity in the following way. To each outcome j of the measurement we associate a state \4> 9uess ) 
which is our "guess" as to what the input state was. The correctness of this guess is measured by a function, the 
fidelity, which depends on the input state and the guessed state f(\(/>j Uess ), For instance / could have the form 

eq. ([!]) or (^). The mean fidelity is then: 

^ = E^EKiH)/(IC ess )>l^})- (?) 

where the probability to obtain outcome j if the state is \tpi) is 

p(j\i) = {ipM^i) (8) 

An optimal measurement is one which maximizes the mean fidelity F. 

This is a rather general formulation of the state estimation problem. However the fidelity is not the most general 
one could consider. To see this let us consider the optimization of F. When we make the optimization, we must 
compare the value of F for different POVM's, however the guessed states \4> 9wess ) are kept fixed. That is the guessing 
strategy is fixed once and for all, and we try to optimize the measurement for fixed guessing strategy. The advantage 
of formulating the fidelity in this way is technical: it ensures that the fidelity depends linearly on the POVM elements. 



We shall show in section [X how to extend our result to more general fidelities for which the guessed states \<p 9uess ) 



are not kept fixed. 

We summarize here the main properties of optimal measurements for the fidelity eq. (]?]) , see also || . 

First of all note that we can always take the optimal POVM to consist of one dimensional projectors bj = \bj)(bj\ 
(The bj are not normalized). Indeed refining a POVM can only increase the fidelity. This can be seen formally in the 
following way: suppose the aj are an optimal POVM, but not necessarily made out of one dimensional projectors. 
Then each aj can always be decomposed as aj = J^ fc \bjk)(bjk\ since it is a positive operator. Inserting this into the 
expression for F one sees that the bjk (to which we associate the guessed state (f> 9uess ) are also optimal. 

Thus we can optimize F in the class of POVM's whose elements are one dimensional projectors \bj)(bj\. These 
projectors are subject to the unitarity condition JV | bj } (bj | = Id- This can be implemented by introducing d 2 Lagrange 

multipliers X^ v which we group into one operator A: 

F = YtPiYtWiibjKb^m, c ess ) - \bj)(bj\ m 

i j j 

= J2Tr[(F 3 -X)\b 3 )(b J \]+Tr\, (9) 
j 

where Fj = ^2 i Pi\'>Pi)('4'i\f(' l Piy4'^ UeSS )- If we vary this with respect to (bj\, we obtain the equations 

(Fj-X)\bj) = 0. (10) 

Inserting this into eq. (||) shows that F = TrX. 

Eq. ((l^Hs the essential equation to find optimal measurements explicitly. For instance consider the first example 
of section [V| There are two input states ipi and tf>2 and two guessed states (f) 9uess — \ipi) and tfi?, uess = \tp2}- If the 
input state is Vi) aim one guesses ^f" 658 , then / = +1, whereas if the input state is ^2) an( A one guesses </>f" ess , 
then / = -1, hence A = £(|Vi>(Vi| " I^XlN) = +a0°x- Similarly F 2 = ^(\ih)(H ~ h/>i)(V>i|) = -a0a x . The 
task is then to find an operator A such that null eigenvectors of F1.2 — A can satisfy the completeness relation. The 
only possibility is A = a(3F Therefore the optimal measurement is along the x axis, and F max = 2<x(3. The second 
example of section ^ can be treated along similar lines. 

An important consequence of eq. (^|) is an explicit expression for the value of F if the measurement is not optimal. 
Consider a measurement a'j which is not optimal, but each positive operator a'j is "close" to the corresponding 
operator bj of the optimal measurement. We then decompose the operator a'j in terms of its components along \bj): 



a 



— Xj\bj)(bj \ + Yj\bj)(bj-\ + Y*\bj-)(bj \ + Zj where the state \bj-} is orthogonal to \bj) and the operator Zj obeys 



3 "JI-J/VJI 1 -J\"3I\~] l_ ' 3 \"3 l\"J\ • "3^ \ " >"3 

Zj\bj) = 0, (bj\zj = 0. Inserting this decomposition into the expression for F, we obtain 

F(a') = TrX + J2Tr[(F ] -X)a' ] ] 



G 



= F max + J2 T r[(Fj -X)zj] 

3 

>F max -C^Trzj, (11) 

3 

where we have used eq. ( |Io| ) and C is some positive constant independent of j. This expresses in a simple way how 
much the fidelity differs from its maximal value in terms of how much the measurement differs from the optimal 
measurement. 



VIII. FIDELITY FOR MEASUREMENTS ON LARGE BLOCKS 

As discussed above it is necessary to also consider measurements on large blocks of L input states ...ipi L ). The 
fidelity for measurements on large blocks is 

N ^ L 

iX,...,iL j— 1 k—1 

where Aj is the measurement on the L input states. The guessed state is the product \Q? uess ) = |0| " ess ...<fi L u ^ ss ). 
The fidelity is taken to be the average of the fidelities for each state \ipi L )- This ensures that eq. ( |l2|) is just 

the average of the fidelities eq. (J7|), as can be seen by rewriting Fl as 

L N 



L 

fc=lj=l ifc 



where the operators A) k ' are the operators Aj restricted to the space of particle k: 



j 



Af = Tr^k J] Pl'\ ^ • (14) 

Note that a possible measurement that maximizes Fl is built out of the measurement {a^} which maximize eq. (Q): 

= a 3l (gi ... (8 a 3i . (15) 



7-ViV 



This measurement has M L outcomes. And in general M will be larger than 2 1 " 

r V JV 

Our main result is that one can always construct optimal measurements with 2 in P ut outcomes per input state which 
also maximize F. Stated with precision we shall prove the following result: 

Consider a state estimation problem in which the unknown state \ipi) have density matrix p — ^iPilifi^iipil and 
von Neumann entropy Ij^p U t = —Trp\n p. The quality of the state estimation is measured by a fidelity of the form eq. 

(0). Given any e > and 77 > 0, then there exists Lq such that for any L > Lq, and any N larger than 2 LI - Iin '' ut+n \ 
there exists a measurement on sequences of L input states which has N outcomes and attains a fidelity Fl > F max — e. 
The Shannon entropy of the outputs per input state, I%, ltp , l t , can therefore be made equal or less then /^^ t + f]. 



It is this result that will be proven in sections XI to XIII 



IX. OTHER FIDELITIES 



Our main result, as stated with precision at the end of the preceding section, applies only to fidelities of the form 
eq. (0) with fixed guessed states. In this section we enquire whether it can be generalized to other fidelities? 

As a first generalization, we consider fidelities of the form eq. (0), but for which both the POVM elements {o^}, and 
the guessed states are undetermined and must be varied to find the optimum estimation strategy. That is whereas 
in section VII the specification of an estimation strategy consisted only of the POVM elements {aj}, it now consists 



of the set {aj, c/) 9uess } which comprises both the POVM elements and the guessed states. An example of such more 
general fidelities was considered in 0. The unknown states \ipi) where taken to be n spin 1/2 particles all polarized 
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along the same direction and the fidelity was taken to be the scalar product of one spin polarized along fl with one 
spin polarized along the guessed direction / = |(fa | Tn gucss }| 2 - 

It is easy to show that our main result eq. (^|) also applies to such more general fidelities for which both the POVM 
elements and the guessed states can be varied. First note that one can always find an optimal estimation strategy with 
only a finite number M of outcomes Associated to each outcome is a guessed state ( j ) ^ uess ( OPT ) ^ j = ^ M. 

Let us now consider the subclass of estimation strategies {aj, ( p& uess ( OPT ) } f or w hich the guessed states are fixed to be 
an optimal set and only the POVM elements can vary. Note that the optimal fidelity for this subclass is equal to the 
optimal fidelity for the more general estimation strategy since the guessed states are taken to be optimal. Since for 
this subclass only the POVM elements can vary, we are in the conditions of section VII and VIII. The result stated 
at the end of section VIII therefore applies. Hence there exists a measurement on large blocks whose output entropy 
is less or equal to the von Newmann entropy of the input states and whose fidelity is greater then the optimal fidelity 
minus e. This shows that our main result also holds for these more general fidelities. 

One can however construct even more general fidelities (for instance by taking the fidelity to be non linear in the 
POVM elements). For such more general fidelities it is an open question whether our claim also applies. One example 
of such more general fidelities is the mutual information eq. (|]). For this particular example our claim also holds. 
This is discussed in the next section. 



X. RELATION TO THE CLASSICAL CAPACITY OF A QUANTUM CHANNEL 

In the state estimation problem as presented in figure 1, the classical source specifies in a completely random manner 
which quantum state is emitted. The task of the measurement is to recognize as well as possible which state was 
emitted by the quantum source. It is instructive to compare this to the problem of classical communication through a 
quantum channel |hJ . In this case the classical source chooses a controlled subset of all possible sequences (called 
code words) in such a way that they can be recognized (almost) perfectly by the receiver. He can then communicate 
classical information reliably through the quantum channel. The relation between the two problems is that in the 
communication problem the receiver must recognize the code words, so he is confronted with a state estimation 
problem, although it is a particular one. 

For this reason the two problems are related both conceptually and formally. On the conceptual side, a corollary 
of our main result is an alternative proof of Holevo's upper bound on the classical capacity of a quantum channel |Tc| ] 
in the case where the quantum channel consists of pure states. Indeed if the message is to be transmitted faithfully, 
Bob must recognize the code words with high fidelity. We can now view the code words as the states \ipi) that are 
emitted by the quantum source in figure 1. The von Newmann entropy of the words is less than nI VN (p) where n is 
the number of letters in a word and I v (p) is the von Newmann entropy of the letters. Recall now that the question 
answered in this paper is to find, among all the measurements which recognize the input words with high fidelity, 
those whose output has the minimum entropy. Clearly this minimum entropy is an upper bound to the capacity of 
the channel. We have shown that it is less or equal to the von Neumann entropy of the channel. Thus the quantum 
channel has a classical capacity less than I VN (p) bits per word, confirming Holevo's result. 

On the formal side, the techniques we have used to construct a measurement which minimizes the entropy of 
the outputs are closely related and inspired by the techniques used to construct a decoding measurement which 
maximizes the capacity of the channel fll|| . There is however a very important difference with the communication 
problem. Indeed in that case one can easily build a measurement with a small number of outcomes (corresponding to a 
few code words, ie. to a small capacity), and the task is to try to maximize the number of outcomes of the measurement 
while continuing to recognize the code words faithfully. In this paper we can easily build a measurement with a high 
fidelity (ie. which is optimal), but with a large redundancy in the output. The difficulty is to minimize the number of 
outcomes (the redundancy) while keeping the measurement optimal. Nevertheless the mathematical technique that 
we use in section [x| to decrease the number of outcomes without substantially modifying the measurement is related 
to the techniques used in JO]. 



XI. ELIMINATING REDUNDANCY 



Our aim in this section is to construct a measurement with less outcomes than the optimal measurement eq. (|l5|). 
The next two sections will be devoted to prove that this measurement does not diminish the fidelity. This measurement 
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is very similar to the measurement used in pi| to decode a classical message sent through a quantum communication 
channel. 

We start from the optimal POVM acting on one input state and decomposed into one dimensional projectors 
bi = \bi)(bi\. We express it in terms of the normalized operator bi = \bi)(bi\ — bi/Tr{bi) as bi = (Throughout 
the text we shall denote normalized operators by"). The /3$ sum to Y^iPi — d obtained by taking the trace of the 
completeness relation. 

We now construct N operators acting on the space of L input states: 

B J = \B J )(B j \ = b jl ®...®b Sti (16) 

where each bj h is chosen randomly and independently from the set b±, . . . , 6 a/ with probabilities p\ — (3\/d, ...,Pm = 
f3 M /d. 

The \Bj) span a subspace Hb of the Hilbert space of the L input states. In this subspace the operator B = . Bj 
is strictly positive, hence we can construct the operators 

C j = \C j )(C j \ = B- 1 ' a B J B- 1 ' a . (17) 

The Cj are positive operators, which sum up to the identity in Hb- Sj=i — lis where lis is the projector onto 
Hb- The POVM we shall use consists of the Cj and the projector onto the complementary subspace Cq — I d L — Hb 
(I d L is the identity on the Hilbert space of the L input states). 

Our strategy in the next sections will be to compute the average fidelity Fl , where the average is taken over possible 
choices of Bj in eq. ([lif). We shall show that the average of Fl satisfies our main result stated at the end of section 
VIII. Therefore there necessarily are some choices of Bj which also satisfy our main result. 

But first we derive some important properties of the Cj. We shall obtain mean properties, where the mean is the 
average over choices of Bj in eq. (|Tq). 

• The mean of Bj is Bj = I d L / d L . 

• The mean of B is: 

_ N 



j=i 



This motivates our writing 



and subsequently making expansions in A. 
The dimension of Hb is 



B = §; (I* + A) (19) 



dl m HB = ]T TrCj = £ TrB-'B, = ^ £ Tr J~^ B j 

j 3 j d 



>^rY. Tr ^-^) B 3 ■ (2°) 



N 



Furthermore 



Tr\Bj=Tr{^B-I d L)Bj 



= Tr 



±-(B j + J2B k Bj)-Bj 



(21) 



where we have used the fact that B? = Bj . We now take the average of this expression. Using the fact that for 



k ^ j, Bk and Bj are independent, the average of B^Bj{k ^ j) is the product of the averagesBfe-Bj = Bj Bk = 
I d L /d 2L . And hence J^k^j B kBj = (N - l)I d t /d 2L . Putting all together, we find TrABj = ^± and 
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d L > dimH B > d L {l - d — - ) . (22) 

This shows that if N is slightly larger than the dimension of the Hilbert space d L , then the Cj (j ^ 0) fill the 
Hilbert space. 

Finally we need to know how much the Cj differ from the Bj. We write \Cj) = aj\Bj) + \Bj~) and compute a?: 

a] = TrCJl, 

= TrBjB-^BjB- 1 ' 2 

= ( TrBiB- 1/2 ^ 2 



3 J 

N \ 2' 



>— . (23) 



Hence 



a?>^(l-TrB,A) 
d L , d L - 1 x 



This is then used to compute the average of (Bj-\Bj-) 



d L d L - 1 



{Bf\Bf) = TrC 3 TrC 3 B 3 < ^^j^ , (25) 



which shows that the Cj are arbitrarily close to the Bj when N > d 



L 



XII. AN INTERMEDIATE RESULT 



In this section we shall prove the following intermediate result: 

Suppose that the input states belong to a Hilbert space of dimension d and have a density matrix p — 
Denote by p m ax the largest eigenvalue of p. Consider measurements on blocks of L input states. 
Give yourself any positive number rj > 0. Let TV be any integer larger than 2 L ( 21nd+lnPmax+ ''). Then there exist 
measurements with N outcomes with a fidelity Fl > F max — R2~ Lv where R is a positive constant. 

In the next section we shall combine this intermediate result with the concept of probable subspace of a long 
sequence of states to prove our claim in full generality. 

To prove this intermediate result, we proceed as follows: 

Let {bj = \bj)(bj\} be a POVM that maximizes the fidelity F eq. ([?]). Using the algorithm of eq. ( |l6| ) to ( p^ ) we 
construct a measurement Cj , j = 0, ...,N acting on the space of L copies of the input states. 
Let us consider the fidelity for the measurement Cj: 



Ft 



N j L 



E i E E icf K>/(^ . C ess ) ( 26 ) 

J= k=l i k 



where the C^ = Tri^k Nli'/fc Pl'J Q are defined as in eq. (|l4|). 

We can decompose C^ (for j ^ 0) according to its components along \bjk)'- Cj k ^ — Xjk\bjk)(bjk\ + Yjk\bjk){bf k \ ■ 
Y* k \bj- k ){bjk\ + Zjk where Zjk\bjk) = 0, (bjk\zjk = 0. Inserting this expression in eq. (^6|), and using eq. (|ll|), yields 




jV 



C ^2 Trz jk - CTrC { k) | (27) 
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where the last term comes from the Cq = I^l — lis outcome. 

It remains to calculate TrCg and Trzjk- We start with the former 

Cf } =Tr ¥k (H Pl ,)(I d L -n B ) 

< (pmaxf^Tril^ - lis) - {p m a X ) L ^(d L - dim H B ) (28) 

where p m ax is the largest eigenvalue of p. 

To estimate Trzjk we recall the decomposition of \Cj) = ctj\Bj) + \Bj-). We can further decompose \Bf) according 
to whether when restricted to the space of the fc'th particle, it is equal to \bj k ) or not: \Bj-) = \bj k )\<fi) + \bf k )\x)- 

(k) 

Inserting this into the trace which yields (7j , we obtain 

Cf = TnMll PI') («il4-> + KM + \bi)\x)) (a*(B 3 \ + ) 

= \K)(K \Xjk + \K){bf h \Y jk + \bf k )(b jk \Y* k + \bf k )(bf h \Z jk . (29) 
The coefficients Xj k , Yjk, Zj k are easily calculated. The one of interest is Zj k = Trzj k - 

Zjk = Tr Y[ Pi'\x)(x\ 

I'jtk 

< (Pma^Hxlx) 

< {p ma x) iL - l) {Bf\Bf) . (30) 
Inserting these bounds into the expression for Fl we obtain 

1 L 

Fl ^lY, ( F ™* - CiPma^iB^Bf) - C{ Pmax ) L -\d L - dimH B )) . (31) 

k=l 

We now take the average of this expression over all possible choices of bj k operators in eq. (|l6|). Inserting eq. j2^ ) 
and (H) yields 

Fl > F max - 2C{ Pmax ) L ' 1 d L< ^^- . (32) 

Therefore if TV > 2 L ( 21nd+ln ' ) — +''), then F~l > F max - i?2" L '' where R = 2C/p max . This proves the intermediate 
result. 

Note that if the input states are uniformly distributed in Hilbert space, ie. p — I /d, then this intermediate result 
directly implies our main claim. Indeed when p = I/d, p max = 1/d, then Fl > F max — R2~ Lv if N > 2 L ( lnd +'?) = 
2 *. input -r V . When the input states are not uniformly distributed in Hilbert space, we must use the notion of probable 
Hilbert space of a long sequence to prove our main result. This is done in the next section. 

XIII. MEASUREMENTS ON PROBABLE SUBSPACES 

We now combine the result of the previous section with the notion of probable subspace of large blocks of states. 
We first recall the properties of the probable subspace Q |J. Consider a long sequence of L' input states ...tpi L ,). 

The density matrix of these states is p = Y\k=i Pk- The projector n onto the probable subspace has the properties 
that given e' > 0, rf > 0, and for V sufficiently large, 

1. TrXVp > 1 — e', ie. the probability to be in the probable subspace is arbitrarily close to 1. 

2. n and p commute, ie. the eigenvectors of p are either eigenvectors of II or of 1 - n. And furthermore the 
eigenvectors which are common to n and p have eigenvalues comprised between 2 L ) < (pL')i < 2 L (~ H+r > > 

3. From these two properties it follows that the dimension of the probable Hilbert space is bounded by (1 — 

e ')2 L '( H -n') < TrU < 2 L ' (H+r ^ 
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Let us now show that measurements restricted to the probable subspace are arbitrarily close to optimal. Suppose 
that Aj is a measurement that optimizes the state determination problem eq. (|l^) for sequences of L' input states 
(for instance the measurement eq. (|l5|). Consider the POVM consisting of the operators A'j = !IA,-!I (to which we 
associate the unmodified guessed states 4> 9 ^ ess ) and the operator 7 — II (to which we associate the minimal value of 
the fidelity f m in)- The fidelity for this measurement is 

N 1 L ' 

h...i L i j=l k=l 



> F 

— max 



N , V 



- Pii-PiL'^i^ii-^itMj -n^jHlV'ii-V'iz,/}^;^/^*.^*) 

h—i L i 3=1 fc=l 

+fmin.Trp(l - n) . (33) 
We bound the second term by 

N V 

ix—i L i j=l k=l 

N 

<f m ax^2\ X Pii-Pt L ,{^i-4i L , \(Aj -n^roiVv-Vv)! 

j=l i t ...i L , 

N 

= f max J2\ Tr ^ A J - n ^ n )]i 

3=1 

N 

= f max Tr[{p - n P U) A,] = f max Trp(I - U) 

3=1 

< e'f max (34) 

where f max is the maximum value of the fidelity and we have used the fact that p — lipli is a positive operator, and 
therefore that Tr[p(Aj — IIA/II)] > which allows us to remove the absolute value sign and put the sum over j inside 
the trace. 

Putting everything together we have 

7l 7 ^ F max 6 (fma X fmin*) • (^^) 

This shows that the restriction of the measurement to the probable Hilbert space diminishes the fidelity by an 
arbitrarily small amount e'(f max - f min ). 



We can now build a measurement which satisfies our main result as stated at the end of section VIII . We decompose 
the input states into blocks of L' states. On each of these blocks we first carry out a partial measurement II and 7 — II 
to know whether it is in the probable subspace or not. If the result is 7 — II the sequence is discarded. The sequences 
which pass the test are kept. 

We now take the sequences which have passed the test as the input states in the intermediate result. These sequences 

T I / TV N I '\ 

belong to a Hilbert space of dimension dim HprobabU < 2 1 '"p"'" 1 " 1 ' and the largest eigenvalue of their density matrix 
is Pmax < 2 1 J "»p"* +?) I . To apply the intermediate result, we take an integer L and an r\ > 0. Then there exists a 
measurement on blocks of L sequences which has a number of possible outcomes equal to any integer N larger than 

2 L(L'(lL%+3ri')+ri) = ^LL '(1^+3^ '+ v/ <L>) and which hag & fideUty ^ > F max - e' (f max - f min ) - R2' Lf > 

where R is a positive constant. 

Let us calculate the entropy Io Utputs of the outputs of this measurement. We need less than 7 e / = — e'lne' — (1 — 
e') ln(l— e') bits to describe whether or not the input state passes the first test of belonging to the probable Hilbert space 
or not. If it does then we need less than In N bits to encode the output of the measurement on the L blocks of probable 
sequences. Therefore the total number of bits we need to describe the outcome of this measurement on LL' elementary 
input states is 7f litput < In N + LI e >. Replacing N by its bound, we have Ioutput < LL'{lY n ^ ut + (3?/ +T]/L' + I e i/L'). 
Since e', r]' and 77 can be chosen arbitrarily small, and L' arbitrarily large, our claim is proven. 
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XIV. CONCLUSION 



In this paper we have obtained a quantitative estimate of how much information can be obtained by a quantum 
measurement. We considered optimal measurements, that is measurements which maximize a fidelity function. We 
then enlarged the set of optimal measurements in two ways. First we considered optimal measurements that act 
collectively on large blocks of input states rather than measurements restricted to act on each state separately. 
Secondly we did not require the fidelity of the measurements to be exactly equal to the optimal fidelity, but only that 
it be arbitrarily close to the optimal fidelity. In this context we showed that whatever property of a quantum system 
one wants to learn about, one can learn at most one bit of information about every qubit of quantum information 
carried by the unknown quantum system. That is, the Shannon entropy of the outcomes of optimal measurements 
can always be made equal or less than the von Newmann entropy of the unknown quantum states. 
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